Parsing Heterogeneous Corpora with a Rich Dependency Grammar

نویسنده

  • Achim Stein
چکیده

Philologist: I need to parse Old French texts of different types (verse, prose, dialects etc.). Do I have to train separate parser models? Computational Linguist: You won’t lose much if you train the parser on all the data you have. P: I can’t do the training myself. What can I expect from existing parser models? C: If the training corpus contained 12th century verse texts, you are best prepared for most flavours of Old French, including prose ––– except for the very oldest texts. P: And if I want to parse very old texts? C: Then the time lapse between your text and the training data should be as small as possible. P: A golden rule to go home with? C: Don’t train on prose if you want to parse verse. P: AOI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

Approaches for Learning Constraint Dependency Grammar from Corpora

This paper evaluates two methods of learning constraint dependency grammars from corpora: one uses the sentences directly and the other uses subgrammar expanded sentences. Learning curves and test set parsing results show that grammars generated directly from sentences have a low degree of parse ambiguity but at a cost of a slow learning rate and less grammar generality. Augmenting these senten...

متن کامل

Data-driven, PCFG-based and Pseudo-PCFG-based Models for Chinese Dependency Parsing

We present a comparative study of transition-, graphand PCFG-based models aimed at illuminating more precisely the likely contribution of CFGs in improving Chinese dependency parsing accuracy, especially by combining heterogeneous models. Inspired by the impact of a constituency grammar on dependency parsing, we propose several strategies to acquire pseudo CFGs only from dependency annotations....

متن کامل

Exploiting Language Variants Via Grammar Parsing Having Morphologically Rich Information

In this paper, the development and evaluation of the Urdu parser is presented along with the comparison of existing resources for the language variants Urdu/Hindi. This parser was given a linguistically rich grammar extracted from a treebank. This context free grammar with sufficient encoded information is comparable with the state of the art parsing requirements for morphologically rich and cl...

متن کامل

Modeling Dependency Grammar with Restricted Constraints

In this paper, parsing with dependency grammar is modeled as a constraint satisfaction problem. A restricted kind of constraints is proposed, which is simple enough to be implemented efficiently, but which is also rich enough to express a wide variety of grammatical well-formedness conditions. We give a number of examples to demonstrate how different kinds of linguistic knowledge can be encoded...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014